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EFFICIENT IMPLEMENTATION OF THE AI-REML ITERATION 
FOR VARIANCE COMPONENT QTL ANALYSIS 

KATERYNA MISHCHENKO, SVERKER HOLMGREN, AND LARS RONNEGARD 

Abstract. Regions in the genome that affect compiex traits, quantitative trait foci 
(QTL), can be identified using statisticai anaiysis of genetic and phenotypic data. 
When restricted maximum-likehhood (REML) models are used, the mapping proce- 
dure is normaUy computationally demanding. We develop a new efficient computa- 
tional scheme for QTL mapping using variance component analysis and the AI-REML 
algorithm. The algorithm uses an exact or approximative low-rank representation of 
the identity-by-descent matrix, which combined with the Woodbury formula for ma- 
trix inversion results in that the computations in the ALREML iteration body can 
be performed more efficiently. For cases where an exact low-rank representation of 
the IBD matrix is available a-priori, the improved ALREML algorithm normally runs 
almost twice as fast compared to the standard version. When an exact low-rank rep- 
resentation is not available, a truncated spectral decomposition is used to determine a 
low-rank approximation. We show that also in this case, the computational efficiency 
of the ALREML scheme can often be significantly improved. 



1. Introduction 

Traits that vary continuously are called quantitative. In general, such traits are af- 
fected by an interplay between multiple genetic factors and the environment. Most 
medically and economically important traits in humans, animals and plants are quanti- 
tative, and understanding the genetics behind them is of great importance. 
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The dissection of quantitative traits is generally performed by statistical analysis of 
genetic and phenotypic data for experimental populations. Such analysis can reveal 
quantitative trait loci, QTL, in the genome that affect the trait. In the field of QTL 



analysis (Lynch and Walsh, 1998), the use of variance component models are commonly 



used, see e.g. ( [Blangero et al., 2001D . Using such models, the statistical analysis is 
performed using maximum-likelihood (ML) or restricted maximum-likelihood (REML) 
estimators which have the advantage that they do not use specific assumptions on the 
design or balance of the data. These types of schemes are often considered to be statisti- 
cally efficient since they utilize all available data. On the other hand, the corresponding 
algorithms are also relatively computationally demanding since non-linear optimization 
problems must be solved using an iterative procedure. When searching for the most 
likely locations of the QTL, REML optimization problems must be solved for many 
genetic locations within an outer global optimization procedure. Furthermore, if the 
significance of the results is established using an experimental procedure like parametric 



bootstrap ( [Davison and Hinkley, 1997[ ), hundreds or thousands of QTL searches must be 
performed. This implies that the computational complexity for the numerical methods 
for solving the REML problems must be minimal for the QTL mapping to be viable. 

During the last two decades, specialized algorithms for variance component analysis in 
animal breeding settings have been developed and implemented in codes like ASREML, 



DMU and VCE, see ( Druet and Ducrocq, 2006 ) and references therein. The algorithms 
used in these codes have in common that they use a specific Newton-type iteration, 
theaverage information restricted maximum likelihood algorithm, Al-REML. However, 
the structure of the problems in traditional animal breeding applications is different from 
that of QTL analysis problems, and the computational schemes should be examined and 
possibly modified before they are applied in a QTL analysis setting. In this short paper 
we perform this type of investigation and develop a new efficient computational scheme 
for QTL mapping using variance component analysis and the AI-REML algorithm. This 
type scheme should then be applied within an efficient and robust optimization method 
for maximizing the likelihood in the AI-REML method, and also within an efficient and 
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robust global optimization algorithm for the search for the outer optimization problem 
where the most likely position of the QTL is determined. 

2. The AI-REML algorithm 
A general linear mixed model is given by 



where y is a vector of n observations, X is the n x nj design matrix for nj fixed effects, 
R is the n X rir design matrix for rir random effects, b is the vector of n/ unknown 
fixed effects, r is the vector of rir unknown random effects, and e is a vector of n 
residuals. In the QTL analysis setting, we assume that the entries of e are identically 
and independently distributed and there is a single observation for each individual in 
the pedigree. In this case, the covariance matrices are given by var{e) = lal and 
var{r) = Aa^, where A is referred to as the identity-by- descent (IBD) matrix. Using 
these assumptions, we have that 



Here, ci > and > and we assume that the phenotype follows a normal distribution 
with y MVN{Xb,V). At least two different procedures can be used for computing 
estimates of b and ai^2- In the standard codes mentioned above, the parameters are 



the inverse of A. To be able to use this approach, A has to be positive definite, or it 
must be modified so that this property holds. Also, the computations can be performed 
efficiently if A is sparse. For the QTL analysis problems, the IBD matrix A is often 
only semi-definite and not necessarily sparse. Here the values of ai^2 are given by the 
solution of the minimization problem 




y = Xb + Rr + e, 



(2) 



var{y) = V = alA + a^I = aiA + (J2I. 



computed from the mixed-model equations (MME) ( [Lynch and Walsh, 1998 ) using a 



(3) 



Min L, 



s.t. ai > 



(4) 



cr2 > 
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where L is the log-hkehhood for the model ([T]), 

(5) L = -2ln{l) = C + ln{det{V)) + ln{det{X^V-^X)) + yP^y, 
and the projection matrix P is defined by 

(6) P = - V-^X{X^V-^X)-^X^V-\ 

In the original AI-REML algorithm, the minimization problem is solved using the stan- 
dard Newton scheme but where the Hessian is substituted by the average information 
matrix H^^ , whose entries are given by 

The entries of the gradient of L are given by 



(8) —=tr[—P\-y'P—Py ,2 = 1,2. 

dai \dai J dai 

3. Efficient implementation of AI-REML optimization 

The main result of this paper is an algorithm that allows for an efficient implementa- 
tion of the iteration body in the AI-REML algorithm for variance component analysis 
in QTL mapping problems. In (IMishchenko et ah]) , we examine optimization schemes 



for the REML optimization and different approaches for introducing the constraints 
for (Ti^2 in the AI-REML scheme. For cases when the solution to the optimization 
problem is close to a constraint, different ad-hoc fixes have earlier been attempted to 
ensure convergence for the Newton scheme. In (IMishchenko et aLl) . we show that by 
introducing an active-set formulation in the AI-REML optimization scheme, robustness 
and good convergence properties were achieved also for cases when the solution is close 
to a constraint. When computing the results below, we use the active-set AI-REML 
optimization scheme from (IMishchenko et aO . 

From the formulas in the previous section, it is clear that the most computationally 
demanding part of the AI-REML algorithm is the computation of the matrix P, and 
especially the explicit inversion of the matrix V = aiA + a2l, where A is a constant 
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semi-definite matrix and ai^2 are updated in each iteration. If V is regarded as a general 
matrix, i.e. no information about the structure of the problem is used, a standard 
algorithm based on Cholesky factorization with complete pivoting is the only alternative, 
and 0{n^) arithmetic operations are required. If A is very sparse, the work can be 
significantly reduced by using a sparse Cholesky factorization, possibly combined with 
a reordering of the equations. This can be compared to the approach taken e.g. in 



(Bates, (2004)), where a sparse factorization is used for A when solving the mixed- 
model equations. However, as remarked earlier A is only semi-definite and not very 
sparse for the QTL analysis problems and this approach is not an option here. 

The IBD matrix A is a function of the location in the genome where the REML model 
is applied. The key observation leading to a more efficient algorithm for inverting V is 
that the rank of A at a location with complete genetic information only depends on the 
size of the base generation. At such locations, A will be a rank-fc matrix where k <^ n 
in experimental pedigrees with small base generations. If the genetic information is not 
fully complete, A can still be approximated by a low-rank matrix. The error in such an 
approximation can be made small when the genetic distance to a location with complete 
data is small. 

Let a symmetric rank-fc representation of A be given by 



(9) A = ZZ^, 

where Z is an n x k matrix. We now exploit this type of representation to compute 

Case 1: An exact low-rank representation of A is available a-priori. In 

general, the inverse of a matrix plus a low-rank update can be computed using the 



Woodbury formula, see e.g. (Golub and C.Van Loan, (1996)), 



(10) B-^ = {C + SiS^)-^ = c-^ - C-^Si{I + S^C-^Si)~^S^ ■ c-^ 
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Applying ( fTOl) with C = I(J2, Si = Z and 5*2 = ctiZ, we get the following formula for 
computing V~^, 



;ii) v-^ = la^^ - a^^dz{i + dz^zy^z^, 



where a = aia2^. The k x k matrix Z'^Z can be computed once, before the Newton 
iteration is started. Also, the matrix (/ + aZ^Z) is only of size k x k, and its inverse 
can be computed using a standard factorization method in 0{k^) arithmetic operations. 

Case 2: No low-rank representation of A is given a-priori. If only the matrix 
A is given, a low-rank represent at ion/ approximation can be computed once before the 
Newton iteration is started. Then, the Woodbury formula is again used to get an efficient 
computational algorithm for the iterations. Computing a low-rank approximation of a 



matrix is a problem with a long history (Eckart and Young, 1936), and the standard tool 



is the truncated singular value decomposition, see e.g. (Golub and C.Van Loan, (1996) 



Press et al., 1997), which computes the optimal approximation in Frobenius norm. In 
our case, A is symmetric and positive semi-definite, and the SVD is equivalent to the 
standard spectral decomposition. Also, the spectral decomposition can be written as 
A = WAW^ = (W^/A){W^/A)'^, where A = diag(Aj) , i = l,...,n. Hence, we compute 
a low-rank representation of A using a truncated spectral decomposition, 



(12) A={WtVAt){WtVAt)\ 

where At is a diagonal matrix of k eigenvalues and Wt is a rectangular matrix of size nxk 
containing the corresponding orthogonal eigenvectors. Here, At and Wt are obtained by 
eliminating the n — k smallest eigenvalues and the corresponding eigenvectors from the 
standard spectral decomposition. Such idea is used in Genetic Principal Component 



Analysis, see e.g. (Kirpatrick and Meyer, 2004) 



Again using the Woodbury formula ( fTOl) . now with C = Ia2, Si = Wty/A^ and 
5*2 = (7iWt\/Al, we compute V~'^ according to. 



(13) = la^^ - a2^^{Wt^/At){I + ^At)-\Wt^/A 



T 
tj ■ 
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Or in equivalent form, 
(14) V^^ = la^^ - a^^aWtiAt^ + aiy^W^^. 

A similar approach for matrix inversion has been used for problems in Gaussian pro- 



cess regression with applications in e.g. machine learning, see (Schwaighofer and Tresp, 2002 



Rasmussen and Williams, 2005 ). However, as far as we know, an approach based on a 
truncated spectral decomposition and formula (fT3l) has not been been pursued for New- 
ton iterations for REML-problems before. 

When performing the computation described by (fT4l) . the matrix Wt is constant and 
can be precomputed before the iteration starts. The matrix (A^^ -|- al) is diagonal, so 
its inverse is trivially computable. 

The factorization ffT2|) is a rank-Zc representation of A. This is exact if A has at least 
n — k zero eigenvalues, and then V^^ = V~^. For a general IBD matrix A, more than k 
eigenvalues will normally be non-zero, resulting in that V~'^ is an approximation of V~^. 
The accuracy of this approximation depends on the quality of the truncated factorization 
f ll2l) for A. For the QTL analysis problems, we need to chose the rank k such that the 
accuracy of the solution to the REML problem, i.e. the variance components ai^2, is 
sufficient. 

Assume that the eigenvalues are ordered such that Ai > A2 > ■ ■ ■ > A„. Several cri- 
teria for choosing k can be considered. One option is to simply set A; to a predetermined 
value, either motivated by the known value of the rank at a fully informative position in 
the genome or determined as a given percentage of n. Such a choice has the advantage 
that it is possible to pre-determine the computational work needed for computation of 
V"'^ in ffT^ . Another easily applied criterion for determining k is given by 

(15) k = kij) = {max i : Xi> rAi}, 



where the threshold r is for example chosen as r = 0.001. This means that all eigenvalues 
of the matrix A that are smaller than O.OOlAi are neglected and the corresponding 
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columns of W in the spectral decomposition are deleted. In the numerical experiments 
presented in the next section, we use the truncation criterion (|T5il . 

For computing the truncated spectral decomposition, we currently use a standard 
eigenvalue solver, compute all eigenvalues and eigenvectors, and then ignore the data 
connected to eigenvalues smaller than given by the truncation criterion. For problems 
with large data matrices where it is known that only a small number of eigenvalues 
will be large, an alternative can be to use e.g. the Lanczos iteration for computing 
these. Potentially, this can reduce the arithmetic work for computing the truncated 
decomposition before the iteration starts. 

Once has been computed using either ( fT^ or ( fTTll . formula (Ej) is used to compute 
P, and the entries of the Hessian H^^ and the gradient of L are computed using the 
formulas (JTj) and ([H]). The matrices involved have common terms which are computed 
once at the beginning of the current iteration to reduce the total complexity. 



4. Numerical Results 
In this section we present numerical experiment computed using experimental IBD 



matrices of size 767 x 767 drawn from a chicken population ( |Kerje et al., 2003| ). For 
these first pilot investigations of the computational performance we have chosen to 
implement the algorithms in Matlab, and we present execution time measurements for 
the Matlab scripts. At a later stage, we will implement the most promising algorithms 
also in C, which will reduce the execution time further. In this case, we expect that 
the relative gain from using the new algorithms will be larger than indicated below. 
Matlab's native implementation of standard inversion is presumably more efficient than 
our Matlab scripts for the new algorithms, which are interpreted at runtime. 

Case 1: We first present numerical results for the case when the IBD matrices A have 
exact low-rank representations given a-priori. Let Tc be the measured computational 
time for the computation of V^^ within the AI-REML scheme using standard inversion, 
and let Tw be the corresponding time when using the low-rank representation and 
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formula (|TT|l . In Figure [H we show Tc/Ty/, i.e. the speedup for the computations of 
V~^, as a function of the rank k of the matrices A. 

The speedup achieved for computing V"^ by using the iow-rank representation of A 

25 I 1 , , , , , 1 



20 - 



5- 




qI 1 \ \ \ \ \ 

100 200 300 400 500 600 700 

Ranl< k of the iow-rank representation of A 

Figure 1. The speedup achieved for the computations of in the 
AI-REML scheme achieved by using the scheme based on fill I) instead of 
standard inversion 

From Figure [1], it is clear that for IBD matrices of size 767 x 767, the implementation 
using the low-rank representation is faster if the rank k is smaller than approximately 
350. Also, for k < 50, the speedup is more than one order of magnitude. 

In Table [1] we present the total CPU-times for the AI-REML scheme and the CPU- 
time for the computations of V^^ within this scheme for different values of k and number 
of iterations equal to 9. The total CPU-times for the AI-REML are denoted by T^^f* and 
T^*, while the CPU-times for the computation of V^^^ are denoted by Tc and Ty/. 

From Table [1] it is clear that for the problems studied here, computing using 
standard inversion accounts for approximately half of the computational time in the 
AI-REML scheme. For small values of A;, the time needed for inverting using the 
low-rank representation can effectively be ignored, and the improved algorithm runs 
approximately twice as fast as the original scheme using standard inversion. 
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Table 1. Total CPU-time for the AI-REML computations and CPU- 
time for computing for the schemes using standard inversion and a 
low-rank representation 

Case 2: We now turn to case 2. Here a low-rank representation of A is not avail- 
able a-priori, and a truncated eigenvalue decomposition must be computed before the 
AI-REML iteration starts. We present results from computations performed using 18 
different IBD matrices corresponding to genetic locations where a low-rank represen- 
tation is not given. The first set of results shows the computational time Tc^tw for 
computing within the AI-REML scheme. In this case we compare standard in- 
version (C) to the scheme given by f[T3|) . based on truncated spectral decomposition 
(TW). 

In Table [2], we show the CPU-time and the number of AI-REML iterations for the two 
schemes. For the scheme based on the truncated spectral decomposition, the table also 
shows the rank k as determined by the truncation criterion ( fT5l) and the relative errors 
(in %) for the variance components (co-ia) arising from using a low-rank approximation 
of A. For the matrices An and ^412, the variance component ai is zero, and it is 
not possible to compute the relative error. For these matrices, the absolute errors in 
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Table 2. CPU-time for computing V ^ within the AI-REML iteration, 



and errors arising from exploiting a low-rank approximation of A 

0"! is very small for both values of r. From the results in Table [2], it is clear that 
the accuracy of the variance components does not only depend on the value of the 
parameter r, but also on the properties of the matrix A. When an approximation of 
is employed in the computations, the optimization landscape is affected in different 
ways for different IBD matrices A, resulting in different behavior for the optimization 
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scheme and different errors in the location of the optima. However, for the problems 
studied here, choosing r = 0.001 results in that the variance components are determined 
with sufficient accuracy for all the IBD matrices examined. 

In the next set of experiments we study the speedup and error introduced by using 
an approximate inverse of V in some more detail for the two IBD matrices Ai and ^13. 
As can be seen from Table Ej Ai is harder to approximate with a low rank matrix than 
A13. In Tables |3] and m we vary the value of the parameter r in the truncation criterion 
(1151) and show the resulting rank k of the approximation of A, the number of iterations 
in the AI-REML optimization scheme, the speedup Tq/Ttw, and the relative errors in 
the variance components. 
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Table 3. Results for the IBD matrix Ai for different values of the trun- 
cation parameter r 

In Figure [2] we finally present total timings for solving the AI-REML problems using 
the new scheme exploiting a truncated spectral decomposition, including the time needed 
for computing the truncated factorization prior to the AI-REML iterations. We also 
compare these timings to the corresponding results for the standard scheme using direct 
inversion. In the figure, the timings for all matrices Ai — Ais are shown as a function of 
the number of AI-REML iterations required. When several matrices result in the same 
number of iterations, the cpu time for each matrix and the average result are shown. 
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Table 4. Results for the IBD matrix ^413 for different values of the trun- 
cation parameter r 



Average Total CPU time 

30 1 1 ^ 1 1 ^ 1— 




1 I \ I I \ I \ \ I I 

4 6 8 10 12 14 16 18 20 22 24 

Number of iterations 

Figure 2. Total CPU-times for solving the AI-REML problems using 
the scheme based on standard inversion of V and the new scheme using a 
truncated spectral decomposition 



14 Kateryna Mishchenko, Sverker Holmgren and Lars Ronnegard 

Theoretically, the average cpu time for the scheme exploiting the truncated spectral 
decomposition should be a straight line. Deviations are due to the inconsistence of time 
measurement in Mat lab. 

From Figure [2], we see that the standard scheme using direct inversion is faster when 
number of iterations is small, i.e. less than 9 iterations. The reason for this is that the 
spectral decomposition of the matrix A is relatively costly, and it must be amortized over 
a number of iterations before the faster iterations begin to pay of. When the number of 
iterations required is significantly larger than 9, which is often the case for real- world 
problems, the new algorithm is significantly faster also when no low-rank representations 
of the IBD matrices are not available a priori. 

5. Conclusions 

In this paper we present a family of algorithms that allow for an efficient imple- 
mentation of the iteration body in the AI-REML algorithm for variance component 
analysis in QTL mapping problems. Combined with the improved optimization scheme 
in (IMishchenko et aLl) . the new algorithms form a basis for an efficient and robust AI- 
REML scheme for evaluating variance component QTL models. 

The most costly operation in the AI-REML iteration body is the explicit inversion 
of the matrix V = aiA + 021-, where the IBD matrix A is constant and positive semi- 
definite, and o"! > and > are updated in each iteration. The key information 
enabling the introduction of improved algorithms is that the rank of A at a location 
in the genome with complete genetic information only depends on the size of the base 
generation. At such locations, A will be a rank-A; matrix where k n, and by exploiting 
the Woodbury formula the inverse of V can be computed more efficiently than by using 
a standard algorithm based on Cholesky factorization. If the genetic information is 
not fully complete, a general IBD matrix A can still be approximated by a low-rank 
matrix and the error in such an approximation can be made small when the genetic 
distance to a location with complete data is small. More importantly, there might be 
a possibility for setting up a low-rank representation of the matrix A also at genetic 
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locations where the information is not complete. This is a topic of current investigation 
( Ronnegard and Carlborg, 2007 ). 

We present results for IBD matrices A from a real data set for two different settings; 
Firstly, we show that if a low-rank representation of A is available a-priori, the inversion 
of V using the new algorithms is performed faster than using a standard algorithm if 
the rank k is smaller than approximately 350. Also, for k < 50, the speedup is more 
than one order of magnitude. 

Then we also show that, even if a low-rank representation is not directly available and 
a low-rank approximation of A needs to be computed before the AI-REML iterations, 
significant speedup of the variance component model computations can still be achieved. 
For QTL mapping problems, the efficiency of our new method will increase when the 
ratio between the total pedigree size and base generation size increases, the density and 
informativeness of markers increases. Hence, the relative efficacy of the method will 
continuously increase in the future with deeper pedigrees and more markers. Also, we 
are currently developing a scheme for directly constructing a low-rank representation 
of the IBD matrix A also between markers, which will result in that the eigenvalue 
factorization prior to the AI-REML iterations is not needed any more. 
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